{% extends "documentation/docs_base.html" %}

{% block doc_title %}Super Search{% endblock %}

{% block doc_content %}
  <div class="body">
    <h1 id="what-is-super-search">What is Super Search?</h1>

    <p>
      Super Search is an interface to crash reports data. There is a
      <a href="{{ url('supersearch:search') }}">human-friendly interface</a> and a
      <a href="{{ url('api:documentation') }}#SuperSearch">public API</a>. This
      guide covers the API, but the interface can be used to easily query and
      visualize the data.
    </p>
    <p>
      There are 3 key points to Super Search. First, it is a secured window on
      all the raw crash reports data we have. Second, it allows you to use
      almost every field as a filter, giving you great power to explore the data
      we have. And third, it has advanced aggregation features that can be used
      to extensively analyze a subset of data.
    </p>
    <p>
      This documentation aims at extensively covering the API.
    </p>

    <h1 id="api-endpoints">API endpoints</h1>

    <h2 id="supersearch">SuperSearch</h2>

    <pre><code><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span></code></pre>

    <p>
      This is the public endpoint that you should mostly be using. It doesn't
      require any permissions, and gives you access to almost all of the data
      about crash reports. The only restrictions are fields potentially
      containing sensitive data (like PII, for Personally Identifiable
      Information), such as <code>email</code> or <code>url</code>.
    </p>

    <h2 id="supersearchunredacted">SuperSearchUnredacted</h2>

    <pre><code><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearchUnredacted') }}</span></code></pre>

    <p>
      This is a restricted endpoint, only accessible through the use of an
      <a href="{{ url('tokens:home') }}">API Token</a> and your having the right
      set of permissions. It gives you an exhaustive access to our entire
      dataset, so please use this very carefully.
    </p>

    <h1 id="basic-usage">Basic Usage</h1>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(product=product, _results_number=10) }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(product=product, _results_number=10) }}</span></code></pre>
    </div>

    <h1 id="response">Response</h1>
    <p>
      The service returns a JSON document with a defined structure. The root
      will always contain the following 3 keys:
    </p>
    <p>
      <code>hits</code> contains a portion of the crash reports that matched the
      query. The number of results, the fields of those results, the ordering
      can all be controled using control parameters. It is an ordered list of
      JSON objects.
    </p>
    <p>
      <code>total</code> contains the total number of crash reports that matched
      the query. It is a positive integer.
    </p>
    <p>
      <code>facets</code> contains the results of the various aggregations that
      are set in the query. It is a JSON object containing ordered lists of JSON
      objects. Each sub-object represents a &quot;term&quot; and its count, and
      can have sub-aggregations. Theoratically, there can be any number of
      nested sub-aggregations.
    </p>

    <h1 id="filters">Filters</h1>
    <p>
      The first thing that Super Search allows you to do is to filter on our
      dataset to reduce it. For example, if you only want to see crashes that
      happened on a given product and version, or with a specific build id.
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(product=product, version=active_versions[product][0].version) }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(product=product, version=active_versions[product][0].version) }}</span></code></pre>
    </div>

    <p>
      There are lots of available filters, listed in the
      <a href="{{url('documentation:supersearch_api') }}">API Reference</a>. That
      list is constantly changing as we add more data to our crash reports.
    </p>

    <h2 id="">Logic</h2>

    <p>
      When using several filters together, you do not have full control over how
      filter are combined together. Note that all filters can receive several
      values. Here are the rules:
    </p>
    <ul>
      <li>if there are several filters with the same name:
        <ul>
          <li>if the operator is a range (see Operators section below), combine all range filters with an <code>AND</code></li>
          <li>combine all remaining filters with an <code>OR</code></li>
        </ul>
      </li>
      <li>filters are combined with an <code>AND</code> operator</li>
    </ul>

    <p>
      For example, <code>product={{ product }} &amp; version=1.0 &amp;
      version=2.0 &amp; date=&gt;2000-01-01 &amp; date=&lt;2001-01-01</code>
      will translate to:
    </p>

    <pre>( product = {{ product }} ) <b>AND</b>
( version = 1.0 <b>OR</b> version = 2.0 ) <b>AND</b>
( date &gt; 2000-01-01 <b>AND</b> date &lt; 2001-01-01 )</pre>

    <h2 id="data-types">Data types</h2>

    <p>
      Each filter has a data type, that corresponds to the type of data the
      associated field can receive. Those types have a direct impact on the
      operators you can use for each field, as explained in the
      <a href="#operators">Operators section</a>. Here are the existing data types:
    </p>

    <table>
      <thead>
        <tr>
          <th>Data type</th>
          <th>Operators</th>
          <th>Description</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>enum</td>
          <td><code></code></td>
          <td>Simple type for strings, relies on inputs being turned into "terms". </td>
        </tr>
        <tr>
          <td>string</td>
          <td><code></code>, <code>~</code>, <code>=</code>, <code>$</code>, <code>^</code>, <code>@</code>, <code>__null__</code></td>
          <td>Advanced type for strings, gives a lot more options. </td>
        </tr>
        <tr>
          <td>number</td>
          <td><code></code>, <code>&gt;</code>, <code>&gt;=</code>, <code>&lt;</code>, <code>&lt;=</code></td>
          <td>Type for all numbers, underlying data can be float, int, long&hellip; </td>
        </tr>
        <tr>
          <td>date</td>
          <td><code>&gt;</code>, <code>&gt;=</code>, <code>&lt;</code>, <code>&lt;=</code></td>
          <td>Type for all dates, underlying data can be date or datetime. </td>
        </tr>
        <tr>
          <td>boolean</td>
          <td><code>__true__</code></td>
          <td>Type for fields that can be true or false.</td>
        </tr>
        <tr>
          <td>flag</td>
          <td><code>__null__</code></td>
          <td>Type for fields that either have a value (generally <code>1</code>) or are missing.</td>
        </tr>
      </tbody>
    </table>

    <h2 id="operators">Operators</h2>

    <p>
      Each filter supports a number of operators, that will allow you to refine
      your searches. Concretely, those operators translate to a string that will
      be put at the beginning of the value of the parameter. For example,
      <code>~</code> is the "contains" operator. If you want to search for all
      crashes that have a signature <i>containing</i> <code>moz</code>, you
      would do this:
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(signature='~moz') }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(signature='~moz') }}</span></code></pre>
    </div>

    <p>
      Some operators (like <code>__null__</code> or <code>__true__</code>) do
      not care about a value, you can use them alone. All operators can be
      negated using the meta operator <code>!</code> as a prefix. Here's how to
      search for all crashes with an address that is not empty:
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(address='!__null__') }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(address='!__null__') }}</span></code></pre>
    </div>

    <p>
      Most operators are only usable with some data types. For example,
      comparison operators (like <code>&lt;</code> or <code>&gt;</code>) are not
      usable with string data, but only with numbers and dates. Operators for
      each filter are listed on the
      <a href="{{ url('documentation:supersearch_api') }}">API Reference</a> page.
    </p>
    <p>
      Here is a list of all operators, and what data types they apply to:
    </p>

    <table>
      <thead>
        <tr>
          <th>Operator</th>
          <th>Code</th>
          <th>Data types</th>
          <th>Description</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>not</td>
          <td><code>!</code></td>
          <td><i>all</i></td>
          <td>Meta operator, to put before any other operator to turn it into the opposite.</td>
        </tr>
        <tr>
          <td>has terms</td>
          <td><code></code></td>
          <td>enum, number, string</td>
          <td>Default operator. The field contains at least once the given substring. </td>
        </tr>
        <tr>
          <td>contains</td>
          <td><code>~</code></td>
          <td>string</td>
          <td>The field contains at least once the given substring.</td>
        </tr>
        <tr>
          <td>is exactly</td>
          <td><code>=</code></td>
          <td>string</td>
          <td>The field is exactly the given substring.</td>
        </tr>
        <tr>
          <td>starts with</td>
          <td><code>^</code></td>
          <td>string</td>
          <td>The field starts with the given substring.</td>
        </tr>
        <tr>
          <td>ends with</td>
          <td><code>$</code></td>
          <td>string</td>
          <td>The field ends with the given substring.</td>
        </tr>
        <tr>
          <td>matches regex</td>
          <td><code>@</code></td>
          <td>string</td>
          <td>The field matches the given regular expression. The accepted syntax is described in the <a href="https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-regexp-query.html#regexp-syntax">Elasticsearch documentation</a>.</td>
        </tr>
        <tr>
          <td>is null</td>
          <td><code>__null__</code></td>
          <td>string, flag</td>
          <td>The field is missing or has a null value.</td>
        </tr>
        <tr>
          <td>greater than</td>
          <td><code>&gt;</code></td>
          <td>date, number</td>
          <td>The field is greater than the given date or number.</td>
        </tr>
        <tr>
          <td>greater than or equal</td>
          <td><code>&gt;=</code></td>
          <td>date, number</td>
          <td>The field is greater than or equal the given date or number.</td>
        </tr>
        <tr>
          <td>lower than</td>
          <td><code>&lt;</code></td>
          <td>date, number</td>
          <td>The field is lower than the given date or number.</td>
        </tr>
        <tr>
          <td>lower than or equal</td>
          <td><code>&lt;=</code></td>
          <td>date, number</td>
          <td>The field is lower than or equal the given date or number.</td>
        </tr>
        <tr>
          <td>is true</td>
          <td><code>__true__</code></td>
          <td>boolean</td>
          <td>The field has a value that is truthy.</td>
        </tr>
      </tbody>
    </table>

    <h1 id="meta-parameters">Meta Parameters</h1>

    <p>
      Along with filters, Super Search exposes a set of "meta" parameters. Their
      names start with an underscore, like <code>_results_number</code>. These
      parameters control various aspects of the results in the <code>hits</code>
      key. They are pretty straight forward and are well described in the
      <a href="{{ url('documentation:supersearch_api') }}">API Reference</a>.
    </p>
    <p>
      <code>_results_number</code> and <code>_results_offset</code> allow you to
      paginate over the results of a request. It is generally safer, when you
      need a lot of results, to run the same request in a loop while
      incrementing <code>_results_offset</code> of the value of
      <code>_results_number</code>, instead of just requesting an arbitraty
      large number of results at once.
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(_results_number=10, _results_offset=100) }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(_results_number=10, _results_offset=100) }}</span></code></pre>
    </div>

    <p>
      Using <code>_sort</code>, you can, well, sort the results of a request.
      This parameter accepts an ordered list of fields. By default, the sorting
      will be ascendant. If you want to sort a field by descending order, you
      can prefix the field name with a minus sign (<code>-</code>).
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(_sort=['platform', '-build_id']) }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(_sort=['platform', '-build_id']) }}</span></code></pre>
    </div>

    <p>
      Each returned document only contains a subset of the available fields. By
      default, those fields are <code>uuid</code>, <code>date</code>,
      <code>signature</code>, <code>product</code> and <code>version</code>. You
      can control that list with the <code>_columns</code> parameter, that
      accepts a list of fields.
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(_columns=['platform', 'build_id', 'uptime']) }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(_columns=['platform', 'build_id', 'uptime']) }}</span></code></pre>
    </div>

    <h1 id="aggregations">Aggregations</h1>

    <div class="side-note">
      <p>
        "Facets" and "aggregations" are the same thing. Those names come from
        Elasticsearch, which renamed the feature to "aggregations" in its
        version 1.0. We are thus using the name "aggregations", or its
        abbreviation "aggs", in all new features.
      </p>
    </div>

    <p>
      Aggregations are all about numbers. Or rather, they give you all the
      numbers about a dataset. There are several types of aggregations,
      depending of the type of data you're interested in, but they have in
      common that they are used for counting things. For example, you would use
      aggregations to:
    </p>
    <ul>
      <li>get the top signatures in a given dataset;</li>
      <li>count the number of crash reports on each version of your product;</li>
      <li>get the count of crashes per build per date...</li>
    </ul>
    <p>
      Aggregations can be queried in various ways, but always come out in the
      same structure, in the <code>facets</code> key of the returned document.
      The structure is as follows:
    </p>
    <pre><code>{
    "facet_name": [
        {
            "term": "foo",
            "count": 42,
            ["facets": {
                "other_facet_name": []
            }]
        }
    ]
}</code></pre>

    <p>
      Aggregations can theoratically be infinitely nested, but are currently
      limited by the available arguments to 3 levels.
    </p>
    <p>
      There are 3 ways of getting aggregations. The first and simplest one is to
      use <code>_facets</code>. The second is to use
      <code>_aggs.field_name</code> and its derivatives, like
      <code>_aggs.field1.field2</code>. And the third one is to use special
      aggregations, like <code>_histogram.field_name</code> or
      <code>_cardinality.field_name</code>.
    </p>

    <h2 id="basic-aggregations">Basic aggregations</h2>
    <p>
      Using the <code>_facets</code> parameter allows you to run simple
      aggregations. You can pass it a field name (see the
      <a href="{{ url('documentation:supersearch_api') }}#section-list-of-fields">List of fields</a>),
      and it will count the terms of that field in all the documents
      of the dataset. It will then return them sorted by descending count.
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(_facets='signature') }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(_facets='signature') }}</span></code></pre>
    </div>

    <p>
      This will return the list of signatures that appear the most in all crash
      reports for the last week, sorted by descending count.
    </p>
    <p>
      The number of results in an aggregation is 50 by default. If you want to
      change it, you can use the <code>_facets_size</code> parameter. Note that
      there is no way to paginate over aggregation results, and that the bigger
      the <code>_facets_size</code> the longer the request will take.
    </p>

    <h2 id="nested-aggregations">Nested aggregations</h2>

    <p>
      Let's say you want to get the count of platforms per product. To do that,
      you would use a nested aggregation: aggregate on products, and then for
      each product aggregate on platforms. Here's what that translates to in
      Super Search:
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(**{'_aggs.product': 'platform'}) }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(**{'_aggs.product': 'platform'}) }}</span></code></pre>
    </div>

    <p>
      The list of all available nested aggregations is in the
      <a href="{{ url('documentation:supersearch_api') }}#section-nested-aggregations-list">API Reference</a>.
    </p>

    <h2 id="histograms">Histograms</h2>

    <p>
      The previous aggregations work great for string fields, but what if you
      want to aggregate on a date for example? Or if you want to manipulate
      ranges of numbers? That is what histograms are made for. Histograms work
      with dates and numbers, and allow you to treat that data as ranges instead
      of "terms". For example, if you want to get a count of crashes per product
      per day, you would run a histogram on the <code>date</code> field, using
      an interval of <i>1d</i> (one day). Here's what it looks like:
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(**{'_histogram.date': 'product', '_histogram_interval.date': '1d'}) }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(**{'_histogram.date': 'product', '_histogram_interval.date': '1d'}) }}</span></code></pre>
    </div>

    <p>
      This works as follows: for each value of the field, it will find in which
      range it belongs, and put it in a bucket for that range. In our case, each
      date value will be put in the corresponding day bucket. Then it counts the
      number of document per bucket, and will run any other aggregation in each
      bucket. If a bucket is empty, it will not be returned, so missing values
      in the results simply mean that they had zero documents.
    </p>

    <p>
      The list of accepted units for date intervals is available in the <a
      href="https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#time-units">Elasticsearch
      documentation</a>.
    </p>

    <p>
      For numbers, you do not need a unit, just a number. Let's take the
      <code>uptime</code> field for example. If you want to count the number of
      crashes happening at startup (let's say startup is up to 60 seconds after
      launching the app), here is what you can do:
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(**{'_histogram.uptime': 'product', '_histogram_interval.uptime': 60}) }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(**{'_histogram.uptime': 'product', '_histogram_interval.uptime': 60}) }}</span></code></pre>
    </div>

    <h2 id="cardinality">Cardinality</h2>

    <p>
      Sometimes you just need to count the distinct values of a field. That can
      be done using the cardinality feature. For example, let's count the number
      of distinct builds of {{ product }}:
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(product=product, _facets='_cardinality.build_id') }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(product=product, _facets='_cardinality.build_id') }}</span></code></pre>
    </div>

    <p>
      Note that the results of a cardinality aggregation are different from
      others. There is a single key in the results, called <code>value</code>,
      and it cannot contain sub-aggregations. Also, there is no parameter for
      cardinality like there is for histograms, you can only use it as part of
      another aggregation parameter.
    </p>

    <h2 id="combining-aggregations">Combining aggregations</h2>

    <p>
      Did you notice, in that last example? We did not use
      <code>_cardinality</code> as a parameter but as the value of the
      <code>_facets</code> parameter. It is indeed possible to combine
      aggregations in various ways. Notably, <code>_histogram.field</code> and
      <code>_cardinality.field</code> can be used as values of
      <code>_facets</code> or <code>_aggs.field</code>. Let us, for example,
      count the number of distinct values of <code>install_time</code> per
      product per version:
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(**{'_aggs.product.version': '_cardinality.install_time'}) }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(**{'_aggs.product.version': '_cardinality.install_time'}) }}</span></code></pre>
    </div>

    <p>
      Or, let's count the number of distinct <code>build_id</code> per day for
      {{ product }}:
    </p>

    <div class="example">
      <pre><code data-url="{{ url('api:model_wrapper', model_name='SuperSearch') }}?{{ make_query_string(**{'product': product, '_histogram.date': '_cardinality.build_id', '_histogram_interval.date': '1d'}) }}"><span class="url-domain">{{ full_url(request, 'api:model_wrapper', model_name='SuperSearch') }}</span><span class="query-string">?{{ make_query_string(**{'product': product, '_histogram.date': '_cardinality.build_id', '_histogram_interval.date': '1d'}) }}</span></code></pre>
    </div>

    <h1 id="what-s-next">What's next?</h1>

    <p>
      If you want to see more concrete examples of problems you can solve with
      Super Search, head to the
      <a href="{{ url('documentation:supersearch_examples') }}">Examples</a>
      page. If you want an exhaustive list of parameters, check out the
      <a href="{{ url('documentation:supersearch_api') }}">API Reference</a> page.
    </p>
  </div>
{% endblock %}
